Picking single-nucleotide polymorphisms in forests

نویسندگان

  • Daniel F Schwarz
  • Silke Szymczak
  • Andreas Ziegler
  • Inke R König
چکیده

With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the first stage, a screening test selects a small number of SNPs for further analysis. The second stage then estimates the effects of the selected variables using logistic regression (logReg). Here, we introduce a novel approach in which the selection of SNPs is based on the permutation importance estimated by random forests (RFs). For this, we used the simulated data provided for the Genetic Analysis Workshop 15 without knowledge of the true model.The data set was randomly split into a first and a second data set. In the first stage, RFs were grown to pre-select the 37 most important variables, and these were reduced to 32 variables by haplotype tagging. In the second stage, we estimated parameters using logReg.The highest effect estimates were obtained for five simulated loci. We detected smoking, gender, and the parental DR alleles as covariates. After correction for multiple testing, we identified two out of four genes simulated with a direct effect on rheumatoid arthritis risk and all covariates without any false positive.We showed that a two-staged approach with a screening of SNPs by RFs is suitable to detect candidate SNPs in genome-wide association studies for complex diseases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points

Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...

متن کامل

Association of two single nucleotide polymorphisms rs10407022 and rs3741664 with the risk of primary ovarian insufficiency in a sample of Iraqi women

Primary ovarian insufficiency (POI) can be a devastating disease impacting women below the age of forty. This involves a major decrease in the amount and quality of oocytes, or ovarian reserve in a woman. The distribution of single-nucleotide polymorphisms, rs10407022 and rs3741664, in Iraqi people and its association with primary ovarian insufficiency is the main objective of this study. The m...

متن کامل

In-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene

Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...

متن کامل

The Single Nucleotide Polymorphisms in the C-reactive Protein Gene: are they Biomarkers of Cardiovascular Risk?

Recent pre-clinical and clinical studies have revealed the C-reactive protein gene (CRP) is related to the degree of acute rise in plasma C-reactive protein (CRP) levels. Moreover, single nucleotide polymorphisms (SNPs) in the CRP gene could associate with increased risk of cancer, atherosclerosis, diabetes mellitus, bowel disease, rheumatoid arthritis, psoriasis, obstructive pulmonary disease,...

متن کامل

No association between single nucleotide polymorphisms in pre-mirnas and the risk of gastric cancer in Chinese population

Objective(s): Accumulating evidence has demonstrated that miRNAs contribute to various genetic and epigenetic modifications in the pathogenesis of gastric cancer (GC). Recent studies focused on the four single nucleotide polymorphisms (SNPs) of pre-miRNAs including rs11614913, rs3746444, rs2910164, and rs2292832. It was suggested that these four SNPs were significantly associated with the risk ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BMC Proceedings

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2007